Large Scale Classification Based on Combination of Parallel SVM and Interpolative MDS

نویسندگان

  • Sun Zhanquan
  • Geoffrey Fox
چکیده

(1 Key Laboratory for Computer Network of Shandong Province, Shandong Computer Science Center, Jinan, Shandong, 250014, China 2 School of Informatics and Computing, Pervasive Technology Institute, Indiana University Bloomington, Bloomington, Indiana, 47408, USA) [email protected], [email protected] Abstract: With the development of information technology, the scale of electronic data becomes larger and larger. Data deluge occurs in many kinds of application fields. How to explore the useful information from the large scale dataset is a very important issue. Data mining is just to take on the task. Support Vector Machines (SVM) is a powerful classification and regression tools of data mining. It has been widely studied by many scholars and applied in many kinds of practical fields. But its compute and storage requirements increase rapidly with the number of training vectors, putting many problems of practical interest out of their reach. For applying SVM to large scale data mining, parallel SVM are studied and some parallel SVM methods are proposed. Currently parallel SVM methods are all based on classical MPI model. It is not easy to be used in practical, especial to large scale data-intensive data mining problems. MapReduce is an efficient distribution computing model to process large scale data mining problems. In this paper, parallel SVM based on iterative MapReduce model Twister is studied. Feature extraction is an efficient means to decrease SVM’s computing cost. Some feature extraction methods have been proposed, such as PCA, SOM network, and Multidimensional Scaling (MDS) and so on. But PCA can only measure the linear correlation between variables. The computation cost of SOM network is very expensive. In this paper, MDS is used to reduce the dimension of sample features, and interpolation MDS is used to improve computation speed. Parallel SVM combines with MDS to analyze large scale classification problems. The efficiency of the method is illustrated through analyzing practical problems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Developing a New Method in Object Based Classification to Updating Large Scale Maps with Emphasis on Building Feature

According to the cities expansion, updating urban maps for urban planning is important and its effectiveness is depend on the information extraction / change detection accuracy. Information extraction methods are divided into two groups, including Pixel-Based (PB) and Object-Based (OB). OB analysis has overcome the limitations of PB analysis (producing salt-pepper results and features with hole...

متن کامل

Nasullah Khalid Alham

Machine learning techniques have facilitated image retrieval by automatically classifying and annotating images with keywords. Among them Support Vector Machines (SVMs) are used extensively due to their generalization properties. However, SVM training is notably a computationally intensive process especially when the training dataset is large. In this thesis distributed computing paradigms have...

متن کامل

A new classification method based on pairwise SVM for facial age estimation

This paper presents a practical algorithm for facial age estimation from frontal face image. Facial age estimation generally comprises two key steps including age image representation and age estimation. The anthropometric model used in this study includes computation of eighteen craniofacial ratios and a new accurate skin wrinkles analysis in the first step and a pairwise binary support vector...

متن کامل

Study on Parallel SVM Based on MapReduce

Support Vector Machines (SVM) are powerful classification and regression tools. They have been widely studied by many scholars and applied in many kinds of practical fields. But their compute and storage requirements increase rapidly with the number of training vectors, putting many problems of practical interest out of their reach. For applying SVM to large scale data mining, parallel SVM are ...

متن کامل

SUBCLASS FUZZY-SVM CLASSIFIER AS AN EFFICIENT METHOD TO ENHANCE THE MASS DETECTION IN MAMMOGRAMS

This paper is concerned with the development of a novel classifier for automatic mass detection of mammograms, based on contourlet feature extraction in conjunction with statistical and fuzzy classifiers. In this method, mammograms are segmented into regions of interest (ROI) in order to extract features including geometrical and contourlet coefficients. The extracted features benefit from...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012